Skip to content

Conversation

@ajay-mk
Copy link
Member

@ajay-mk ajay-mk commented Jul 14, 2025

This PR introduces the ability to compare performance on CI for PRs.

There are two parts to the workflow:

  1. A python script benchmark_compare.py which can be used to compare performance of two commits using benchmark outputs.
    How to use:
python3 benchmark_compare.py abc123 def456
python3 benchmark_compare.py master your/feature/branch

The script is self-contained and can be run locally. All build-related variables (e.g., CMake flags) are defined inside the script.

  1. A benchmark comparison which can be triggered through a comment on PR or manual dispatch. The workflow uses the python script to compare the benchmark outputs of base and head commit, and reports the results as a comment using the github-actions bot. (See example at Expt: Dummy Branch ajay-mk/SeQuant#4 (comment))

Caveats: This isn’t 100% foolproof. GitHub-hosted runners can have noise in benchmark results due to shared hardware. From some testing I did, this happens only rarely. It's not perfect, but it should help us catch major performance regressions if any.
Notes: Ideally, we should run this on a self-hosted runner to get reliable numbers. GitHub does not recommend doing this for public repos because of security concerns.

This is my first time wiring up a somewhat complicated CI workflow—open to suggestions or improvements.

@ajay-mk ajay-mk requested a review from Copilot July 14, 2025 14:49

This comment was marked as outdated.

Why: When writing JSON/CSV, the standard locale set by `set_locale` breaks the JSON/CSV structure because of the presence of commas in large numbers.
@ajay-mk ajay-mk force-pushed the ajay/feature/benchmark-workflow branch from f96974e to 71ba004 Compare July 14, 2025 14:54
ajay-mk added 4 commits July 14, 2025 10:55
Presence of commas in numbers break JSON/CSV output
This script can be used to compare peformance of two commits. It checks out, builds and compares benchmark outputs of both commits.

How to use:
python3 benchmark_compare.py abc123 def456
python3 benchmark_compare.py master your/feature/branch
This is a workflow which can be triggered by commenting "\benchmark" on PRs (or through manual dispatch).

It uses the `benchmark_compare.py` to run and compare benchmarks. The comparison result is posted as a comment on the PR.
@ajay-mk ajay-mk force-pushed the ajay/feature/benchmark-workflow branch from 71ba004 to bfe6010 Compare July 14, 2025 14:55
@Krzmbrzl
Copy link
Collaborator

Google Benchmarks ships with a dedicated Python script to compare benchmarks. I recommend making use of that rather than rewriting something like that.

Additionally, I think we should run the entire benchmark and not only the CC one.

@ajay-mk
Copy link
Member Author

ajay-mk commented Jul 14, 2025

Google Benchmarks ships with a dedicated Python script to compare benchmarks. I recommend making use of that rather than rewriting something like that.

I did see that. It outputs just differences, I wanted to see the percentage differences also. I will check if there is a way to reuse it.

Additionally, I think we should run the entire benchmark and not only the CC one.

Yes, we do run the sequant_benchmarks target in this case. I added the cc target just for convenience.

ajay-mk and others added 3 commits July 16, 2025 14:47
- When trying to benchmark with branch names, presence of '\' can have issues, replace them with '-'
- output filename changed to "benchmark_comparison.txt"
- Add ability to import the compare_benchmarks function for independent use
@ajay-mk
Copy link
Member Author

ajay-mk commented Jul 17, 2025

I have made a couple of changes:

  • Fixed an issue with benchmark output naming
  • Output file is now called benchmark-comparison.txt
  • Also made the the compare_benchmarks function more reusable. I can separate it into another file if needed, then if we have two benchmarks, we can just call python3 compare.py base.json new.json

@ajay-mk ajay-mk requested a review from Copilot July 17, 2025 17:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds CLI and CI support for benchmarking performance regressions, including locale formatting controls in the runtime, a custom CMake target for a specific benchmark filter, and a GitHub Actions workflow to compare benchmark outputs between two commits.

  • Introduce disable_thousands_separator API and call it in benchmarks/main.cpp to ensure no digit grouping in JSON/CSV outputs.
  • Add sequant_benchmark_cc custom CMake target for running only the cc_full benchmark.
  • Create a workflow (benchmark_compare.yml) that triggers on PR comments or manual dispatch to run comparisons via benchmark_compare.py.

Reviewed Changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
benchmarks/main.cpp Call disable_thousands_separator() right after setting the locale
benchmarks/CMakeLists.txt Add sequant_benchmark_cc custom target to run the cc_full filter
SeQuant/core/runtime.hpp Declare disable_thousands_separator() in the public API
SeQuant/core/runtime.cpp Implement disable_thousands_separator() via locale facet overrides
.github/workflows/benchmark_compare.yml Add workflow to compare benchmark outputs and post results as a comment
Comments suppressed due to low confidence (3)

.github/workflows/benchmark_compare.yml:131

  • The workflow uploads ${base_sha}-${head_sha}-comparison.txt but reads benchmark-comparison.txt. Update the filename to match the actual uploaded artifact path or use the dynamic names when reading.
            const comparisonFile = `benchmark-comparison.txt`;

benchmarks/CMakeLists.txt:25

  • [nitpick] The custom target name sequant_benchmark_cc is ambiguous—consider aligning naming with sequant_benchmarks_cc or adding a more descriptive suffix to clarify it runs only the cc_full benchmark.
add_custom_target(sequant_benchmark_cc

benchmarks/main.cpp:13

  • [nitpick] Consider adding a brief inline comment explaining why disabling the thousands separator is needed in this context, e.g., to ensure CSV/JSON parsers receive ungrouped numbers.
  disable_thousands_separator();

@ajay-mk ajay-mk force-pushed the ajay/feature/benchmark-workflow branch from 2107abf to 9667369 Compare July 17, 2025 17:43
@ajay-mk ajay-mk marked this pull request as ready for review July 17, 2025 18:19
@Krzmbrzl
Copy link
Collaborator

I wanted to see the percentage differences also

It should be pretty straight forward to compute these from the absolute differences, no? The benefit of making use of the "built-in" comparison would be that this script (almost certainly) will get updated along with any potential file format changes of the benchmark output.

Copy link
Collaborator

@Krzmbrzl Krzmbrzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Comment on lines +54 to +55
if metric not in ["cpu_time", "real_time"]:
raise ValueError("Invalid metric specified. Use 'cpu_time' or 'real_time'.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend using an enum for the metric. This will make it obvious which values are valid.

@ajay-mk ajay-mk marked this pull request as draft October 1, 2025 03:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants